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I . INTRODUCTION 

Prior to costly operational testing of a system consisting 
of hardware and its embedded software, it would be highly 
desirable to know whether these two major components are 
sufficiently reliable to support such testing. Specifically, 
this is equivalent to asking whether the software has reached 
a state of maturity such that unforeseen faults (bugs, errors, 
system crashes, etc.) are not likely to occur during 
operational test of the entire system, or later, during a 
systemic mission. 

Estimation of hardware reliability is relatively well- 
understood. Unfortunately, software reliability or maturity 
prediction is not as well understood at this time. The 
ANSI/IEEE definition of software reliability is the ability of 
a program to perform a required function under stated 
conditions for a stated period of time (IEEE, 1984) . Since 
testing software has an associated cost whether it is in 
computer run time, labor costs, lost market share resulting 
from late delivery of a product or, in the case of military 
equipment, sacrificed range- testing time and aborted missions, 
there is a finite time allocated for testing and removal of 
faults (bugs) . A moderate-sized program with 264 branches 
would have 2 264 independent paths (greater than the estimated 
number of atoms in the universe) . Obviously, it is infeasible 
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to test each path (Dalai and Mallows, 1988) . Testing and 
debugging costs are estimated to range from 50% to 80% of the 
costs for development of a working version of software 
(Beizer, 1984) . The constraints of a finite time period for 
testing and the cost of testing are excellent incentives for 
prompt and accurate determination of software reliability. 
Put in the form of a question: when can testing be stopped 
and the product delivered with a high level of confidence that 
the customer will be satisfied? 

Software reliability estimation is based on the results of 
testing. Software testing can be broken down into four major 
categories: unit, integration, system and regression testing. 
Unit testing is usually done by the programmer in an informal 
manner. Integration testing is done in an orderly progression 
such that the software elements are combined and tested until 
the entire software package has been tested. System testing 
is integration of hardware and software to verify that the 
system meets specified requirements. Regression testing is 
retesting to detect faults that may have been introduced 
during program modification (Hernandez, 1989) . One purpose of 
testing is to produce quantitative measures of software error- 
proneness after effort has been expended in the integration 
testing, system testing, and fault removal phases. 

Software testing, a follow-on to hardware reliability 
prediction has been of considerable importance and interest 
from the mid- 1960's to the present. The Navy's Operational 
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Test and Evaluation Force recently (January, 1992) held a 
symposium for DoD agencies to discuss and exchange ideas and 
methodologies on software testing and reliability. There are 
two basic differences between hardware and software 
reliability predictions. Hardware prediction usually assumes 
independence of failures, and, after some point, the 
reliability measuring process does not affect the failure 
rate. Software reliability prediction models should assume 
interdependence of unit failures, and that testing improves 
reliability. Removing a program fault or bug during 
developmental testing reduces the likelihood that a fault will 
become operative later in an operational setting that will 
cause a mission to abort. The software fault -prevalence and 
appearance prediction problem has been judged to be inherently 
more difficult than hardware reliability prediction (Beizer, 
1984) . 

There are several software reliability models that will be 
discussed later. Beizer in his seminal work Software System 
Testing and Quality Assurance (Beizer, 1984) summed up the 
similarities of the models best. 

1. Most models assume a fixed but unknown number of 
faults when testing. 

2. Faults are universally assumed to be independent (some 
of the later models, Schneidewind' s Software Reliability 
Model for example, do not necessarily make this 
assumption) . 

3. Most models assume perfect debugging. That is, the 
debugging process introduces no new faults. However, some 
of the later models take into account that not all 
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detected faults will be fixed, and that the debugging 
process itself may introduce new faults (Littlewood and 
Verrall 's Bayesian Reliability Growth Model takes into 
account imperfect debugging) . 

4. Most models assume that test time and calendar time 
are the same . 

5. The models assume that failure rate is proportional to 
the faults remaining. This implicitly means that faults 
are assumed to cause single failures and each failure can 
be related to one failure. 

6. The models assume path homogeneity. That is, data are 
entered randomly and such data uniformly exercise all 
code. This is in direct contradiction to the reality that 
the most paths cover a small percentage (say under 10%) of 
the code. 

The difference between the models lies in the degree with 
which these assumptions hold true, i.e. the type of random 
process according to which the failures occur, and how data is 
fitted to the models (Beizer, 1984) . 

The models that are described in Chapter II do not 
necessarily perform well for all types of data. There is no 
"silver bullet" (Brooks, 1986) that will take on all comers 
successfully. One model may predict reliability well for one 
data source but not another. The users of the models must 
take into consideration the predictive quality of a model 
prior to basing decisions on the output of the model (Abdalla 
et al , 1986) and (Goel, 1985) . One possible way to do this is 
to analyze the data using various models. The manager selects 
the model that demonstrates the best predictive qualities, 
i.e. the model that appears to best fit the data and provide 
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useful results. The choice is difficult because it is 
conducted in an atmosphere of uncertainty. 

Our hypothesis is that software reliability can be 
predicted, but with error. It is important to take account of 
the variabilities and uncertainties that are inevitably 
present, at least those associated with sampling (finite 
data) , the most serious errors may be associated with model 
choice, however. To test this hypothesis of predictability we 
analyze sources of fault (error or bug) data using a 
modification of the BELLCORE MODEL (Dalai and Mallows, 1988) 
to estimate the reliability of the particular software project 
and the quality of the prediction produced by the model. 
Parametric estimates are made by maximum likelihood but also 
by use of an approximate Bayesian technique. Error estimates 
are made by a re- sampling technique known as bootstrapping. 

The parametric bootstrap technique was used in the 
aftermath of the Challenger disaster to analyze the 0- rings 
that failed. Although the analysis was done on hardware the 
methodology that we propose in Chapter III and the appendix is 
similar. The analysis of the 0- rings showed the bootstrap 90% 
confidence limits expected catastrophic failure rate of at 
least 13% at temperature of less than 31 degrees, but less 
than a 2% failure rate at temperatures above 60 degrees (Dalai 
et al , 1989). Had the NASA decision makers had this 
information available to them the consideration to postpone 
the launch may have been taken more seriously and the disaster 
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prevented. The analogy for the software manager to consider 
is the predicted number of faults to occur for some specified 
time acceptable. It is hoped that, the wrong decision will 
not have consequences as severe as the Challenger disaster. 
The techniques that we describe provide a quantitative tool 
for the software manager to substantiate the decision to 
schedule (postpone) system operational testing. 

In Chapter II, we briefly describe several software 
reliability prediction models that have been proposed in order 
to provide a basis of understanding of the discussion. In 
Chapter III and the appendix, we present the model fitting 
procedure, the method used to determine the quality of the 
prediction, the resulting data obtained from the analysis, and 
methods to improve this methodology from the perspective of a 
software manager. In Chapter IV, our conclusions are provided 
and directions for future research are suggested. 
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II. 



SURVEY OF SOFTWARE RELIABILITY METHODOLOGIES 



This survey is concerned with only two categories of 
software reliability models: those for time between errors 
(TBE) , and for fault count (number of errors in a specified 
time) . 

A. TIME BETWEEN ERRORS (TBE) 

TBE reliability assessments attempt to predict the mean 
time between failure (MTBF) of the ith failure based on that 
to the (i-l)th failure. The TBE can be measured in either 
central processing unit (CPU) time or wall -clock time. Wall- 
clock time can be misleading: it can elapse regardless of 
whether or not the program is running. From this information 
the software manager can gain confidence that the software 
will exhibit the operational capability to complete its 
mission: to operate without failure for a mission time. A 
system that experiences multiple, severe software errors that 
prevent the system from completing its operational mission is 
not ready for costly live exercises as in operational testing. 
For example, a system that is supposed to detect, track and 
engage a missile during a scenario of five minutes' duration, 
but whose software experiences a severe fault every thirty 
seconds on average, is obviously not ready to conduct an 
expensive live exercise or actual mission. Here are some 
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models that attempt to predict (mean or average) time to 
failure . 

1. Jel inski and Moranda Model 

Jelinski and Moranda developed the "De-Eutrophication 
Model" (Moranda and Jelinski, 1972), (Farr, 1983). The 
assumptions are: 

• The rate of fault detection is proportional to the current 
fault content of a program. 

• All faults are equally likely to occur and are independent 
of each other. 

• Each fault is of the same severity as any other fault. 

• The fault rate remains constant over the interval between 
fault occurrences. 

• The software is operated in a manner similar to 
anticipated operational usage. 

• The faults are corrected instantly, without introduction 
of new faults into the program. 

The hazard rate for the ith fault is 

ZjU) =6[i\Mi-l)] , (2.1) 

where: N = total number of faults initially in the system 

i = ith fault to occur 
0 = proportionality constant. 

X, = t; - t,_, is the time between the ith and the (i-l)st fault 
and is assumed to have an exponential distribution with rate 
Zi(t ; ) : 

f(X i )=B[N-(i-l)]e i ~* llf ~ u ~ 1)]Xi) . (2.2) 
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The likelihood function for the parameters 9 and N is 



LUi ,X n )=JJ^ i 0[W-(i-l)]e I ' ew " (i ' 1)> ^ 1 . (2.3) 



Taking the partial derivatives of ln(L) with respect to N (N 
is allowed to assume any real value as a convenient 
approximation) and 9, and then setting the equations equal to 
zero, the solutions for the following set of equations are 
obtained as maximum likelihood estimates for N and 9 (N is 
estimated by numerical techniques, then used to solve for 9) : 



0 = - 



n 






(2.4) 






n 



N-U-l) 1 ( yn (i _ 1) v) 

S-1 * 



(2.5) 



The estimate for the mean time between failure (MTBF) for the 
(i+l)st fault occurrence is 



MTBF itl 



1 

Z(ti) 



1 

0 (N-i ) 



( 2 . 6 ) 



The data required to use the Jel inski -Moranda model are the 
observed times of the fault occurrence (t/s), or the times 
between the faults (x/s). 

2. Schick-Wolverton Model 

The hazard rate for the Schick-Wolverton model 
(Schick and Wolverton, 1978) and (Farr, 1983) is proportional 
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to the number of faults in the program and the amount of 



testing time. An assumption of the model is that as more 
testing is completed the probability of detecting faults 
increases because of "zeroing- in" on the areas of code where 
the errors lie. The assumptions are: 

• The rate of fault detection is proportional to the current 
fault content and to the amount of time expended in 
testing . 

• All faults are equally likely to occur 

• All faults are independent of each other 

• All faults are of the same severity 

• The software is operated in a manner similar to the 
anticipated operational usage 

• Perfect fault correction occurs. 

The hazard function is 



Z(X ± ) =Q[N-(i-l)]X i , 



( 2 . 7 ) 



where: X, = the amount of time spent testing between the 



occurrence of the ith and the (i-l)st fault 



N = total number of faults initially in the program 
9 = proportionality constant. 



The reliability function of X s is 




( 2 . 8 ) 
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The density function of X ; is 



f Uj) =-R'iX i ) =0 [N- (i-l) ] X t e 






(2.9) 



If X; 2 /2 is replaced by Y t the model is formally identical to 
the Jel inski -Moranda model previously described. In fact, 
substitution of any known function of X ; allows transformation 
to the Jel inski -Moranda model. N and 9 are estimated by 
MLE ' s : 

a 2 n 



( N-(i-l))X l 



( 2 . 10 ) 



el 






(N-(i-l) ) 



( 2 . 11 ) 



The estimate for the mean time between failure (MTBF) for the 
(i+l)st occurrence is 



MTBF^ 



N 20 (N-i) 



( 2 . 12 ) 



The data requirements are the time of the fault occurrence, t;, 
or the time between the ith and (i-l)st fault. 

3 . Geometric Model 

The Geometric model (Moranda, 1975) and (Farr, 1983) 
is a modification of the Jel inski -Moranda "De-Eutrophication" 
model. It differs from that model as follows: it does not 

assume a fixed number of faults in the program, and the faults 
are not equally likely to occur because as debugging 
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progresses faults become harder to detect. The assumptions 



are : 

• There is an infinite number of total faults (the program 
is never totally fault free) . 

• All faults do not have the same chance of detection. 

• Detections of faults are independent. 

• The software is operated in a manner similar to 
anticipated operational usage. 

• The fault detection rate forms a geometric progression and 
is constant between faults. 



The hazard rate for the ith fault is 



where: t, = time between the ith and the (i-l)th fault 



X, = time between the ith and the (i-l)st fault. The X; are 
independently and exponentially distributed with rate Zj(t), 
so the density function of Xj is 



Zj(t) =D0 i " 1 , 



( 2 . 13 ) 



D = initial hazard rate 



9 = fault detection rate (0<9<1) 



n 



the nth fault to occur. 




( 2 . 14 ) 



D and 9 are estimated by MLE's: 




( 2 . 15 ) 
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(2.16) 




J3 + 1 
2 



Equation (2.16) is solved for 6, and that value is substituted 
into (2.15) to find D. From these equations the MTBF until 
the (n+1) st fault occurs after n faults have occurred can be 
obtained: 



The data requirements are the time of the ith fault (t ; ) , or 
the time between the faults (X ; ) , for i = l,2,...,n. 

4. Use of Time Between Errors (TBE) Models 



either wall -clock time or CPU time. The models may be used to 
predict the expected time to the next failure. Confidence 
limits on the expected value should be used to obtain a range 
of time to the next failure. The software manager should be 
asking: is the expected time of next time of failure longer 
than the time required for operational testing of the software 
within the overall system? If the time required for 
operational testing of the system is greater than the mean 
time to failure for the (i+l)th failure then the prudent 
software manager should consider postponing operational 




(2.17) 



The TBE for models in this category can be measured in 
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testing in favor of continued developmental activity and 
testing . 

B. FAULT COUNT MODELS 

Fault count models use the number of faults that occurred 
in a testing interval to determine the expected number of 
faults in the next testing interval. Software managers can 
employ this method by simply counting the number of faults in 
a given test period i.e. day, week, or month, provided test 
exposures are the same. This provides insight into how well 
the testing process is working. 

1. Generalized Poisson Model 

The Generalized Poisson Model (Schafer et al, 1979) , 
(Farr, 1983) is similar to the Jel inski -Moranda and Schick- 
Wolverton models but uses fault count observations in fixed, 
equal -length intervals rather than times between faults. The 
assumptions are: 



• The expected number of faults occurring in any time 
interval is proportional to the fault content (number of 
bugs remaining) at the time of testing, and to the amount 
of time that has been previously spent in testing. The 
actual number of faults that appear is assumed to be 
Poisson distributed. 

• All faults are equally likely to occur and are independent 
of each other. 

• Each fault is of the same severity. 

• The software is operated in a manner similar to the 
anticipated operational usage. 
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• The faults are corrected at the ends of the testing 
intervals. (Note: Faults discovered in one test interval 
may be corrected at another test interval; the only 
restriction is that the fault correction come at the end 
of the testing intervals.) 



Testing intervals are of length Xj, and f; faults occur during 
the ith interval. At the end of the ith interval a total of 
M; faults are corrected. 

The expected number of faults in the ith interval is 



N = initial number of faults 

gi = function of the amount of testing time spent 

previously and currently and is nondecreasing; 
as testing progresses more faults are found 
specifically, 



f, is Poisson with mean = 0(N-M 1 .,)g i . N and 0 are estimated by 
MLE' s : 




( 2 . 18 ) 



where: 0 = proportionality constant 



g i {x i ,x 2 , . . . ,x i ) =x- , 



( 2 . 19 ) 



where a is assumed known. 




( 3 . 20 ) 




( 2 . 21 ) 
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These non-linear equations must be solved for 0 and N. From 
this the expected number of errors in the (n+l)st test 
interval can be obtained, 

E(f n ^ 1 )=&(N-M n )g ntl (x 1 x n+1 ) , (2.22) 

where: x n+1 is the anticipated testing time for the (n+1) st 

test interval. 

The data requirements for this model are the lengths of the 
test intervals, (x^) , the total number of faults corrected at 
the end of a test interval, (M*) , and the number of faults 
discovered in each interval (f ; ) . 

2. Non- homogeneous Poisson Process Model 
The Non -homogeneous Poisson Process Model (NHPP) (Goel and 
Okumoto, 1979) and (Farr, 1983) assumes that the fault counts 
for testing intervals follows a Poisson distribution. The 
expected number of faults in the Poisson process model is 
proportional to the number of faults left in the program. The 
assumptions are: 

• The software is operated in a manner similar to the 
anticipated operational usage. 

• The numbers of faults detected, (f ; ) , in the any test 
interval, (t i . 1 ,t i ), are independent for any finite 
collection of times t, , <t 2 , . . . , t i( . . . , t m _, , t m . 

• Faults are of the same severity. 

• Faults are equally likely to be detected. 

• The cumulative number of faults detected at any time t, 
(N(t)), is a Poisson distribution with mean m(t) . The 
mean, m(t) , is the expected number of faults to occur for 
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any time period (0,t) and is proportional to the expected 
number of undetected faults at time t. 

• m(t) is bounded. 



The specific mean function used is 

m ( t) -a (l-e~ bt ) , (2.23) 

and fj is the number of faults in the ith interval, 

-Nit^) , (2.24) 

where: a = expected total number of faults to be 

eventually detected, 
a and b can be estimated by MLE's: 



a = 






(1 - e" £t ") 



(2.25) 






(l-e" £t ”) 






e -£t l .,_ e -£t t 



(2.26) 



From the estimates of a and b the expected number of faults in 
the next (m+l)st test interval is estimated to be 



m( t mn ) -m{ t m ) =a(e 



-St m -St-. i 

ID _ 1 



) . 



(2.27) 



The data required for this model are the fault counts of each 
test interval, (fj) and time of the test interval, (tj) . 

3. Schneidewind' s Software Reliability Model 

Schneidewind' s model (Schneidewind, 1975) and (Farr, 
1983) maintain that as testing progresses the fault detection 
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process changes. The later faults are therefore more useful 
in determining future fault counts. The model allows for 
three approaches . 



1. Utilize all the fault counts from the m intervals. 

2. The first (s-1) intervals are ignored and only the s 
through m interval fault counts are considered. 

3. The first (s-1) intervals fault counts are summed, and 
the individual fault count from the remaining s through 
m intervals are treated individually. Denote the sum of 
the fault counts in the first s-1 intervals by: 

• < 2 - 28 > 



Method 1 is used when the analyst feels that all intervals 
will be useful. Method 2 can be used when a significant 
change in the fault detection process has occurred at 
approximately the (s-l)st interval. Method 3 attempts to 
combine the effects of both approaches. The assumptions for 
all methods are the same: 

• The fault counts for each interval are independent of each 
other. 

• The fault correction rate is proportional to the number of 
faults to be corrected. 

• The software is operated in a manner similar to the 
anticipated operational usage. 

• The mean number of detected faults decreases from one 
interval to the next. 

• Intervals are all of the same length. 



18 



• The rate of fault detection is proportional to the number 
of faults remaining. The fault detection process is 
assumed to be a non -homogeneous Poisson process with an 
exponentially decreasing appearance and detection rate. 



The rate of change of the number of faults detected in the ith 
interval is 



The cumulative mean number of faults that occurs up to and 
including interval i is 



d i =ae t_pi) . 



(2.29) 



= l-e- pi ) . 



(2.30) 



The mean number of faults for the ith interval is 




= « (e (-P<i-i))_ e <-Pi>) . 



(2.31) 



a and /5 can be determined by MLE's: 



P=ln(y) , 



(2.32) 




(2.33) 



For Method 1 , y is the solution to: 



(2.34) 



y-1 y m -l 



where : 
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(2.35) 



A= YZt (s+i- 1 )f s + i - 






(2.36) 



For Method 2 , y is the solution to 

Ay m-s+ 2 _ ( A +F s m ) y B ' s,1 + ( {m-s+1) F s m -A) y+ ( A+F s _ m - (m-s+1) F s m ) =0 , 

(2.37) 

where : 



a= EI!o 1 f s*i ' 



( 2.38) 






(2.39) 



a = 






l-e' 



For Method 3, y is the solution to 



is-DF s _ i , 



/7?F„ 



yS-i_l y-1 y^-l 



-A , 



(2.40) 



(2.41) 



where: A is the same as Method 1 and F Sjin is the same as Method 



2. From the MLE's of ex and fi the expected number of faults in 

(2.42) 



the (m+l)st interval is . 

E{f ml ) =4 (e-» i -e-> {i+1 >) 



$ 
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The time needed to detect a total number of M faults is 



l09( (S-pM) ] (2.43) 

P 

The data needed for this model are the fault counts for each 
interval and a history of testing process in order to 
determine the interval that testing procedures may have 
altered significantly. 

4. Use of Fault Count Models 

Fault count models use the number of faults that occur 
in some testing interval. The models in this category predict 
the expected number of faults to occur in some additional time 
interval. Confidence limits on the expected number should be 
used to obtain a range of the predicted number of faults to 
occur for that time interval. Since there can never be a one 
hundred percent guarantee of perfect software, the software 
manager should be asking: is the predicted number of faults 

to occur for the time interval of interest acceptable for 
operational testing? If the predicted number of faults to 
occur is too great then the prudent software manager should 
postpone operational testing in favor of continued 
developmental activity and testing. 

C. SOFTWARE RELIABILITY MODELS 

The number of software reliability models continues to 
grow. Assumptions have broadened to reflect the reality of 
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the software development process with increased accuracy. The 
assumptions of some models described appear to be limiting. 
Faults all of the same severity can be worked around by 
modeling faults according to severity. The assumption that 
all faults are equally likely to occur and independent of each 
other can be resolved by assuming low severity faults occur 
more frequently than high severity faults, but faults of the 
same severity class will be considered equally likely to 
occur. Instantaneous fault correction can be avoided by not 
counting faults which were previously detected (and counted at 
time of initial detection) , but were not corrected (Farr, 
1983 ) . 

Software managers need to be aware of the limitations and 
underling assumptions that underlie the various models that 
are available. The data that is needed to fit the models is 
critical to reliable results. The data collection needs to be 
an accurate reflection of the meaningful historical testing of 
the software. Some of the data that should be collected is 
computer usage time, testing intensity, extent of the software 
that was tested (was the entire system tested or just a 
particular module), and milestones in the software's 
development (are requirements changed or added midway through 
the development of the software?) and, of course, the cost of 
testing. 

This study illustrates the use of a particular reliability 
model. Some of the specific questions that this thesis 
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addresses are: How is a software reliability model used? 
What type of information does a model require? What kind of 
decision can a software manager make based on the results of 
the reliability model? 

In today's fiscal environment software managers should 
have a "warm fuzzy feeling" substantiated by quantitative 
results for their product prior to initiating costly full 
scale, live operational testing. 
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III. DATA ANALYSIS 



A. MODEL DEVELOPMENT 

The model that is applied in this thesis is based on the 
assumption that the rate of error occurrence is a non- 
stationary Poisson process (NSPP) (Dalai and Mallows, 1988) . 
The model is identical to the Schneidewind model, and is 
fitted according to Method 1, which assumes that all fault 
data is of equal value. Let N(t) be the number of faults that 
occur in (0,t); where t is software running time. The 
probability that the number of faults to occur by time t is 
given by: 



PWc)-^ e ' Md(X|e)) ° 

n ! 



(3.1) 



where X (t) =X (l-e' Mt ) . A test time, t s , was chosen. This length 
of time is divided into periods of length A = t,/J; where J is 
the total number of intervals. The jth interval is such that 
( j - 1) A<t< jA. The number of observed counts (faults) in the 
jth interval is nj. The probability distribution for the 
number of faults in [(j-l)A to jA] is 



P{N j =N(j A) -N( (j- DA) =rij} =e' 






, , ( 1 ,- 0 , 1 , 2 , 

n.j\ J 



(3.2) 
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where 



kj-ElNj] =X (l-e'^) -X (i- e -t* ( J- 1)A ) , 



(3.2a) 



=Ae" ,Mj '‘ 1)A (l-e -,,A ) . (3.2b) 

The parameters /x and X are estimated by maximum likelihood. 
The likelihood function is 



• ( 3 . 3 ) 

The natural log of L(X,/x) is 

J(X,ji)=ln(L)=-5^. 1 . (3.4) 

The partial derivatives of 2(X,/x) with respect to X and /x are 
taken and set equal to zero. This allows X to be written in 
terms of /x and n(t s ), the total number of counts to occur up 
to time t s( as . . . 



X is substituted into the partial derivative of 1 with respect 
to /x to give, 



dl/d\i = -n (tJ ts& 






1-e 






-A n( t c ) + 



n( t s ) Ae |lA 
l-e"** A 



=0 



(3.6) 



where 



nU s ) “I^.i U-Dnj . 



(3.7) 



/x can now be solved for from the following equation: 
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Ae~^* A _ M' 1 *^ _ An( fc s ) 
l-e‘^ A i-e' pts 23 (t s ) 



This equation closely resembles Schneidewind' s result; see 
(2.41). Since t s =AJ, equation (3.8) becomes 



e~^ A _ Je -*t, = nUs) 
l-e^ A i-e" Mts n(t s ) 



then, 



e~^ A _j (e~ tlA ) J ^ n{t 9 ) =r 
l-e~^ A i-(e _,iA ) J n(t s ) 



(3.10) 



By letting x=e' MA into equation (3.10) becomes, 

x _ T x J ^n(t m ) _ _ 

c/ ' . . X / 

l-x 1-X J 23 (t s ) 



(3.11) 



x is solved for iteratively. Let J=0 for the first iteration, 
then 



r(l) = 



x(l) 

l-x(l) 



23 ( t s ) 
23 (t g ) 



(3.12) 



23 (t g ) _ r(l) 

23 ( C s ) +n ( C s ) l+r(l) 



(3.13) 



r(2) is 



r(2) =r ( 1) +J. 



x(l) J 
l-x(l) 



j 



(3.14) 



x(2) is given by, 
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(3.15) 




Hence the iteration of r(n) and x(n) is 



r (n+1) =r (n) +J , 

l-X{n) J 



(3.16) 



x(n+ 1) = 



r (n+1) 



(3.17) 



l+r(n+l) 



The iterative process continues until x(n+l)-x(n) < e, where 
e is a suitable small number; x(n+l) is then substituted into 



expected number of faults to be observed in some additional 
operating time t c< where (t s , t s +t 0 ) is of length kA, can be 
estimated 



A Bayesian methodology is discussed in the appendix. This 
method attempts to utilize past experience from software 
projects having similar characteristics as the software in 
question. If the distributions of X and /x are known from 
experience then this information can be useful in estimating 
the parameters X and p. 



equation (3.5) to get X. Using the estimates of p and X, the 




1-e 
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B . BOOTSTRAP 



Bootstrapping was used to obtain the confidence limits for 
p, X, and E [N (t Q ) -N ( t,) ] = E[AN(t 0 )]. This technique takes 

into account the sampling uncertainties in the estimates by 
removing the errors in the standard approximation (Dalai et 
al, 1989) and (Efron, 1985). To obtain the estimates of the 
sampling variability of p, X, and E [N(t c ) - N(t s ) ] =E [A(t 0 ) ] 
proceed as follows. The probability that a count occurs in 
the jth period is conditional on N(t,)=n(t,): 

PiN^xii N J =n J \N 1 +N 2 + . . .+Nj=n(t s ) } (3.20a) 



»\7 -n ( t s ) ! 




(3 .20b) 



where EX,= l-e- ,,lA . From this the probability that a count falls 
in the jth interval is 



1 - e -iPA 

P . = — . 

J i - e - Jp4 



(3.21) 



Uniform (0,1) random numbers were generated, where the 
k=l , 2 , . . , n (t,) ; U t is the kth random number. If P^cUfcsPj then 
a count is added to n 3 . The simulated n^'s were then used to 
re-estimate p, X, and E[AN(t 0 )]; these are the bootstrap 
values. This process was repeated 1000 times to get a range 
of values for p, X, and E[AN(t c )] . To create a 90% confidence 
limit of the estimate E[AN(t 0 )] the 1000 bootstrap estimates 
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of E[AN(t 0 )] were ordered and the values of the 50th and 950th 
quantiles were found. These are quoted as the 90% confidence 
region (E[AN(t 0 )] 5 , E [AN ( t Q ) ] 95 ) . 

C . RESULTS 

The estimates for the parameters were obtained using three 
different A values and three different t, values. The value 
t 0 was selected such that t 0 + t, « time of last observed fault 
to occur; this allows for comparison of the predicted expected 
number of faults to occur with the observed data. The data 
provided in Tables 1 through 6 are the 90% confidence interval 
obtained by the bootstrap. The most difficult aspect of this 
thesis research was obtaining appropriate test data. The 
data that I received from various sources was unacceptable for 
various reasons: no testing history, severity of faults not 
listed, no milestone events listed (i.e. one data set covered 
10 years but no indication of modifications to the software) , 
non- software errors listed with software errors, description 
of errors could not be interpreted (which may have eliminated 
some of the problems mentioned above) . The underlying cause 
of this is that organizations that I contacted for data do not 
use any systematic method for determining software 
reliability. A "warm fuzzy feeling" for the software seems to 
be the current method used to judge the reliability of the 
software. This feeling gets warmer and fuzzier as deadlines 
draw closer. The data sets used in the analysis of the model 



29 



were obtained from a technical report on other software 
reliability models (Abdalla et. al . , 1986). The data was 
given as time (CPU) between failures. The results of the 
bootstrap for Data Set 1 are given in Tables 1-3; the 
graphical results (Dalai, 1990) are depicted in Figures 1-3. 
The results of the bootstrap for Data Set 2 are given in 
Tables 4-6; the graphical results (Dalai, 1990) are depicted 
in Figures 4-5. 

D. USE OF RESULTS 

Suppose a time t, has been spent testing the software, and 
n(t s ) faults were found. The n(t s ) faults can be broken up 
into n/s, the number of faults in each period j of size A (Enj 
= n(t 8 )). This information can be used to estimate the 
parameters Ji and X, and a point estimate of the mean or 
expected number of faults to appear in the time interval (t s , 
t s +t c ) . Operational testing of the system will require some 
time t 0 . Bootstrapping can now be done to assess the sampling 
uncertainty in the estimate of the expected number of faults 
to appear in (t 8 , t s +tj . This will be done by quoting 
bootstrapped 90% confidence limits. The expected number of 
faults predicted to occur can be compared to the requirements 
of the system i.e. for some time t 0 for example; at most F 
faults are allowed (suppose F can be specified) . If the 
predicted expected number of faults is less than the allowable 
number of faults then system operational testing might be 
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worth the expense at this time. In contrast to this, if the 
expected number of faults is greater than the specified number 
of faults then system operational testing should be postponed. 
Testing should continue in the lab, at the developmental level 
until t 8 and n(t,) are large enough that the expected number of 
faults for the required operational time meets specification. 

A more conservative approach is to replace the estimate of 
the mean number of faults by the upper confidence limit of the 
mean number of faults. Such a conservative approach is 
recommended. 

If there are no specifications the individual responsible 
for scheduling system operational testing will have to make a 
subjective decision. Is the expected number of faults to 
occur in (t 9 , t 8 +t c ) small enough to warrant spending the money 
to carry out system operational testing, or should this 
testing be postponed until the expected number of faults is 
lower. The assumption is that lab testing will continue on 
the software, increasing t, and n(t 8 ), but reducing the number 
of unfound and uncorrected faults. The more faults found in 
lab testing of the software the fewer the number of faults 
that are likely to occur in the more costly system operational 
testing . 
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E. APPLICATION TO TWO DATA SETS 



The fitting and error assessment procedure was applied to 
two data sets (Abdalla et al, 1986) . Figures 1, 2, and 3 
refer to Data Set 1; Figures 4, 5, and 6 to Data Set 2. 

Figure 1 has a A of 10 CPU minutes with three combinations 
of t s and t D . If the range of the expected number of faults 
for t 8 =1250, t o =250 (2.21 to 6.09) is acceptable the software 
manager may choose to schedule operational testing. The same 
argument can be made for t 8 =1000, t o =500. A problem occurs for 
t s =500 and t o =1000 . If the range for the expected number of 
faults to occur (4.69 to 22.22) is acceptable the software 
manager may choose to schedule operational testing. 
Unfortunately, 46 faults occur in (t,, t 8 +t 0 ) . This is 
extremely likely to be the result of use of an inappropriate 
model (it does seem unlikely that software with as many as 22 
mission- critical faults would be viewed as acceptable for 
starting operational testing) . What can the software manager 
do to prevent something like this from occurring? Ideally, as 
testing continues, the rate at which faults occur should 
decrease (assuming a constant relative rate of testing) , with 
that rate asymptotically approaching zero as t, becomes large. 
The slope of the estimated total expected number of faults 
verses test time for Data Set 1 from T=300 to T=500 is m=0.08 
(faults/cpu min). Figure 1 depicts this: the rate at which 
faults are occurring does not appear to be tapering off. The 
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software manager can use this information to support a 
decision to go ahead with (or postpone) operational testing. 
From T=1000 to T=1500 the slope is 0.028 (faults/cpu min) and 
appears to be tapering off. The range of the expected number 
of faults to occur in the specified t 0 accurately reflect what 
actually occurred. If the range of the expected number of 
faults is acceptable the software manager should go ahead with 
operational testing. Figure 2 (A = 20 cpu minutes) and Figure 
3 (A = 50 cpu minutes) can be interpreted similarly. 

The change in A for both data sets did not have a 
significant impact on the range of the expected number of 
faults to occur, indicating that the model is somewhat 
insensitive to the size of A. 

Data Set 2 (Figures 4,5, and 6) shows only a small 
indication of the slope decreasing. This is why the 
confidence limits of the expected number of faults is so wide. 
The software manager can apply the same techniques listed 
above to make a decision to schedule (or postpone) operational 
testing. The software manager must repeatedly address the 
questions: is the rate of occurrence of faults lessening, and 
is the range of expected number of faults acceptable to 
support operational testing? 

A fitted model may indicate a narrowing range of expected 
number of faults and slope asymptotically approaching zero, 
consequently the software manager schedules operational 



33 



testing. Unfortunately, the results of the operational 
testing may be poor i.e. a relatively large number of errors 
may occur indicating that more developmental activity and 
testing is required to improve the software. For example, the 
model predicts n(t 0 )~22 for Data Set 1 (t,=500, t o =1000) , but 
the number of observed faults that occurred in t 0 was more 
than twice the predicted amount, 46. This example illustrates 
the relationship between modeling and testing. While a 
systematic underestimation indicates flaws in the model, 
occasional underestimation simply reinforce that software 
reliability models do not take the place of stressing software 
within a full system in a real-life operational environment. 
The purpose of this thesis is to provide the software manager 
with a tool to aid in the decision as to when to initiate 
operational testing, not to replace such a test. 
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TABLE 1 

ESTIMATE OF PARAMETERS FOR DATA SET 1 
t,= 1250, t o =250 (CPU MINUTES) 
Observed number of bugs in t c is 6 
90% Confidence Interval 


A (CPU min) 


A 


X 


E[N(t 0 ) ] 


10 5 % 


0.00272 


134.595 


2.21 


95 % 


0.00176 


146.509 


5.76 


20 5 % 


0.00270 


134.993 


2 . 32 


95 % 


0 . 00174 


147.798 


6.09 


50 5 % 


0.00270 


135.258 


2.25 


95 % 


0.00175 


148.142 


5.82 



TABLE 2 

ESTIMATE OF PARAMETERS FOR DATA SET 1 
t 8 =1000, t o =500 (CPU MINUTES) 
Observed number of bugs in t 0 is 14 
90% Confidence Interval 


A (CPU min) 


A 


X 


E[N(t 0 ) ] 


10 5 % 


0.00298 


128 . 701 


5.03 


95 % 


0.00177 


147.640 


14.73 


20 5 % 


0 . 00298 


128.969 


5 . 07 


95 % 


0.00176 


148.393 


14.81 


50 5 % 


0.00296 


129.828 


5.17 


95 % 


0.00175 


150.549 


14.9 6 



TABLE 3 

ESTIMATE OF PARAMETERS FOR DATA SET 1 
t,=500 , t o =1000 (CPU MINUTES) 
Observed number of bugs in t c is 46 
90% Confidence Interval 


A (CPU min) 


A 


X 


E[N(t 0 ) ) 


10 5 % 


0.00600 


95.010 


4.69 


95 % 


0.00327 


112 . 711 


20.97 


20 5 % 


0.00600 


95.352 


4.70 


95 % 


0.00326 


113.863 


21 . 14 


50 5 % 


0.00588 


96.859 


5 . 00 


95 % 


0 . 00317 


118.432 


22.22 
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TABLE 4 

ESTIMATE OF PARAMETERS FOR DATA SET 2 
t,= 800 , t o =3 00 (CPU SECONDS) 
Observed number of bugs in t 0 is 12 
90% Confidence Interval 


A (CPU min) 


A 


X 


E [N ( t 0 ) ] 


10 5 % 


0 . 00288 


82.479 


4.75 


95 % 


0.00111 


126.998 


14.69 


20 5 % 


0.00288 


82.718 


4 . 73 


95 % 


0 .00111 


127.645 


14 . 64 


50 5 % 


0.00287 


83.722 


4.78 


95 % 


0 . 00111 


131.003 


14.66 



TABLE 5 

ESTIMATE OF PARAMETERS FOR DATA SET 2 
t,= 600 , t o =500 (CPU SECONDS) 
Observed number of bugs in t 0 is 21 
90% Confidence Interval 


A (CPU min) 


A 


X 


E [N ( t 0 ) ] 


10 5 % 


0.00298 


78.513 


10.10 


95 % 


0.00068 


195 . 611 


37.09 


20 5 % 


0 . 00296 


79 . 189 


10.21 


95 % 


0.00067 


200.950 


37.32 


50 5 % 


0.00298 


80.710 


10.14 


95 % 


0 . 00067 


211.307 


37.43 



TABLE 6 

ESTIMATE OF PARAMETERS FOR DATA SET 2 
t,=400 , t o =700 (CPU SECONDS) 
Observed number of bugs in t 0 is 37 
90% Confidence Interval 


A (CPU min) 


A 


X 


E[N(t„)] 


10 5 % 


0.00456 


58.950 


9.03 


95 % 


0.00058 


239.964 


62.43 


20 5 % 


0.00458 


59.423 


8.96 


95 % 


0.00054 


263.011 


63.88 


50 5 % 


0.00446 


62.014 


9.45 


95 % 


0.00047 


325.387 


66.55 
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Figure 1. Data Set 1, A = 10 (CPU minutes) 
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Figure 2. Data Set 1, A = 20 (CPU minutes) 
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Figure 3. Data Set 1, A = 50 (CPU minutes) 
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Figure 5. Data Set 2, A = 20 (CPU seconds) 
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Figure 6. Data Set 2, A = 50 (CPU seconds) 
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IV. CONCLUSION 



Software reliability models are useful tools that managers 
of software intensive projects have at their disposal. The 
bootstrapping technique will provide the manager a range of 
expected number of faults estimated to occur for some 
additional operating time. The question is, is the upper 
limit of the expected number of faults estimated to occur 
acceptable? The potential risks are additional cost for 
further testing or late product delivery. The ideal case is 
reliable software delivered on time and on budget. 
Unfortunately, reality is rarely ideal. The software manager 
must decide: is it better to deliver a product on time that 

may be considered unreliable by the user and be sent back for 
further testing, or to deliver a product late but of 
acceptable quality to the user? The purpose of this thesis is 
to provide a quantitative tool for the manager who may have to 
make such qualitative decisions. The use of software 
reliability models is not without associated cost, and risk. 
The data must be collected for input to the model. 
Recommendations for the type of data that should be collected 
are : 

• Operating time between failures (CPU time is the best) 
(Musa and Okumoto, 1984) . 

• Calendar time between failures, although such times may 
not accurately reflect the opportunity for faults to 
reveal themselves (Musa et al, 1987). 
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• Testing history i.e. how many people are involved in the 
testing effort. 

• How the software was tested 

• Intensity of the software testing 

• Cost of testing i.e. the cost to find and repair a fault 
before and after product delivery. 

Without useful data a reliability model has little 
practical use. The model presented in this thesis should be 
validated using data from several Navy systems. 

There are several areas for further research. How 
accurate are the predicted confidence limits in this model? 
What are the limits of applicability of this model? What 
effect do inaccuracies (due to replacing observed data with 
hypothesized data in cases where insufficient data is 
available) have on the model i.e. how robust is the model? 
Further development of other software reliability models 
should be pursued. Emphasis should be placed on obtaining 
confidence limits in addition to quoting only a point estimate 
of the expected number of failures predicted to appear for 
some additional testing time. These models should be verified 
using data obtained from Navy software intensive systems. It 
is infeasible to test every possible branch in a large program 
for faults. The software manager needs technical assistance 
in identifying where effort and money should be spent to 
deliver the best possible product. Will many faults in 
portions of the software that are rarely used/reached cause 



44 



more problems for the user than a few faults in frequently 
used/reached portions. 
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APPENDIX 



Software projects may have similar characteristics such as 
testing strategies or architecture, so that the information 
obtained about the reliability of one software project may be 
used to aid in the prediction of the reliability of another 
similar, software project. This process can make use of 
Bayesian methodology (Dalai and Mallows, 1990) , (Farr, 1983) . 
If prior distributions of X and /i are specified then this 
information can be used help estimate the parameters X and n; 
the posterior for these is 



where p x (X) and p^(^) are the prior distributions of X and ^ 
estimated from another software project that has 
characteristics similar to the software project currently 
being tested. The simplest idea is to integrate out X and 
marginalize on ji which yields: 



p A/U (X,\i) =KL(X,\i)p i {\)p u (\i) , 



(a. la) 




(a . lb) 



p .< » )mK C 



(l-e-f 4 )” 11 -’ . (a. 2) 
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The most convenient choice of Px(X) is the (conjugate) Gamma: 



E m - e -.» P" 1 

P * U) e np> ■ 



(a. 3) 



which when substituted into equation (a. 2) yields the density, 



* Jo r(p) 



(a . 4a) 



(i-e'^) n(ts) f~ e z z a(t ‘ ) '*~ 1 dz 
J° (cc+l-e'^ J ) n(£e) + 



(a . 4b) 



=K"e^^ n(te) (l-e‘** A ) n[ts) 



(a . 4c) 



(a + l-e-^ J ) n(tfl)+p 

Using an uninformative prior, a=0, j3=0, and setting x=e' hA 

equation (a. 4c) becomes 



p x (x) =K’x nUs) (l-x) n{ts) 



( a +l-x J ) n(tff) 



(a. 5) 



The mode of the density is 

T (x) =ln (p^(x) ) =n ( t s ) lnx+n ( t s ) In (1-x) - (n ( t s ) + 0) In (a+l-x J ) . 

(a. 6) 

Taking the partial derivative of equation (a. 7) with respect 
to x yields: 
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(a. 7) 



fl(t s > _ x _ n(t s ) +P ; jxf_ 
n{t s ) 1-x n(t g ) * a +l-x J 

If a=/3=0 equation (a. 7) is the same as equation (3.11), which 
gives the MLE. 

Suppose m=E [X] and a 2 = Var [X] in the prior, then a = m/a 2 and 
/3 = m(m/cr 2 ) . Equation (a. 7) is 

x _ n(t s ) +m(m/a 2 ) ^ j x y = n(C s ) R) 

1-x n(t s ) * (m/ a 2 ) +l-x J n(t s ) 

If X is interpreted as the total number of faults in a 
particular software project, then the number of faults is 
discrete so a discrete distribution should be used for the 
prior, i.e. one could use a Poisson for the prior. However, 
it is easier to work with a Gamma distribution. If the Gamma 
distribution has same parameters as a Poisson then equation 
(a. 8) is (since m=a 2 ) 

x _ n(t,)+a v Jx j m n(t.) _ (a . 9) 

1-x n( tj 2-x J n(t,) 

It is clear that the variance to mean ratio of the prior has 
strong influence on the effect of a prior estimate of the 
mean . 

One Bayesian approach to estimation is to find the mean 
(rather than the mode, or highest point of the posterior as is 
essentially done in the likelihood approach) of the 
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(approximate) posterior, Osxsl. To obtain an approximate 
posterior mode proceed as follows. If J is large x 1 is small 
provided x>0, so expand in Taylor's series to get 

p I (x)=Jc ,, [x ii (l-x)"+(i^) (x s +J) (1-x)"] , (a. 10) 

1 l+a 

where: n=n(t,) and n = n(t,). 

Equation (a. 10) is a convex combination of two beta densities. 
K** can be found by setting the left hand side of (a. 11) = 1. 
E [x] = E[e'" A ] can be found, 

rt .. | rg.l)r[n.l] _n.l (a. 11) 

r(n+n+l) n+n +1 



+ ( n+ P ^ r (n+<J+l) T (n+1) n+J~+l j 
l + a r(n+<J+n+l) n+J+n+1 

The approximation to this is 



n! n+1 + n+P (n+J) ! n+tl+l 

( n+n ) ! n+n+1 l + <* (n+J+n) ! n+J+n+1 _ e (iA 

n! ^ n + p (n+J) ! 

(n+n) ! l + « (n+J+n) ! 



(a. 12) 



Unfortunately, n=n(tj=136 for Data Set 1; even with factoring 
out n=n(t s ) , the factorial ratios are on the order of 10‘ 300 . 



However, it is justifiable to use an approximation to the 



factorials to get _ „ _ _ 

n+1 + n+P n+J+l ^ n+1 

a. n+n+1 l +a n+n+J+1 n+n+1 

X“ 



1 + n+P ( 1 

l + a n+n+1 



(a. 13) 
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The numerical results of equation (a. 13) are in Tables A1 
through A6 for Data Sets 1 and 2. The graphical results are 
shown in Figures A1 through A6 . The range of the estimated 
number of faults to occur in (t 9/ t 8 +t 0 ) is much smaller than 
that of the bootstrap results discussed in Chapter III. None 
of the results (estimated number of faults to occur) using the 
Bayesian method contain the observed faults. A possible 
explanation for this is inappropriate values for oi and /3 
(a=j3=0) . After various projects have been analyzed with 
software reliability models, fault distribution may become 
more apparent. This information can then be incorporated to 
reliability models. I feel that, despite the surprising 
initial results, this method does promise to be a useful tool 
to the software manager. 
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TABLE A1 

BAYESIAN ESTIMATE OF PARAMETERS FOR DATA SET 1 
t,= 1250, t o =250 (CPU MINUTES) 

Observed number of bugs in t 0 is 6 
90% Confidence Interval 



A 


(CPU min) 


A 


X 


E[N(t 0 ) ] 


10 


5 % 


0.00339 


131.892 


1.08 




95 % 


0.00263 


135.028 


2.42 


20 


5 % 


0 . 00340 


131.870 


1 . 10 




95 % 


0.00264 


134.992 


2.48 


50 


5 % 


0.00339 


131.914 


1.09 




95 % 


0.00262 


135.103 


2.45 



TABLE A2 

BAYESIAN ESTIMATE OF PARAMETERS FOR DATA SET 1 
t.-lOOO, t o =500 (CPU MINUTES) 

Observed number of bugs in t Q is 14 
90% Confidence Interval 


A (CPU min) 


A 


X 


E [N ( t 0 ) ] 


10 5 % 


0.00399 


124.289 


1.98 


95 % 


0.00311 


127.719 


4.51 


20 5 % 


0.00399 


124 . 304 


1.99 


95 % 


0.00310 


127.768 


4.54 


50 5 % 


0.00398 


124.328 


2.01 


95 % 


0.00309 


127.828 


00 

LO 



TABLE A3 

BAYESIAN ESTIMATE OF PARAMETERS FOR DATA SET 1 
t,= 500 , t o =1000 (CPU MINUTES) 

Observed number of bugs in t c is 46 
90% Confidence Interval 


A (CPU min) 


A 


X 


E [N ( t 0 ) ] 


10 5 % 


0.00808 


91.608 


1.61 


95 % 


0.00602 


94.660 


4.65 


20 5 % 


0.00809 


91.603 


1.60 


95 % 


0.00601 


94.697 


4.69 


50 5 % 


0 . 00797 


91.708 


1.71 


95 % 


0.00596 


94.805 


4.79 
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TABLE A4 

BAYESIAN ESTIMATE OF PARAMETERS FOR DATA SET 2 
t,= 800 , t o =300 (CPU SECONDS) 

Observed number of bugs in t 0 is 12 
90% Confidence Interval 


A (CPU min) 


A 


X 


E[N(t c ) ] 


10 5 % 


0 . 00464 


75.849 


1.39 


95 % 


0.00340 


79.192 


3.32 


20 5 % 


0 . 00464 


75 . 846 


1.39 


95 % 


0.00340 


79.216 


3 . 34 


50 5 % 


0 . 00465 


75 . 837 


1.38 


95 % 


0.00339 


79.242 


3.35 



TABLE A5 

BAYESIAN ESTIMATE OF PARAMETERS FOR DATA SET 2 
t 8 =600 , t o =500 (CPU SECONDS) 

Observed number of bugs in t 0 is 21 
90% Confidence Interval 


A (CPU min) 


A 


X 


E[N(t„) ] 


10 5 % 


0.00600 


66.830 


1.74 


95 % 


0.00429 


70.363 


4.74 


20 5 % 


0.00600 


66 . 828 


1 . 74 


95 % 


0 . 00429 


70.361 


4 . 73 


50 5 % 


0.00596 


66 . 872 


1.78 


95 % 


0.00429 


70.368 


4.74 



TABLE A6 

BAYESIAN ESTIMATE OF PARAMETERS FOR DATA SET 2 
t„=400 , t o =700 (CPU SECONDS) 

Observed number of bugs in t e is 37 
90% Confidence Interval 


A (CPU min) 


A 


X 


E[N(t 0 ) ] 


10 5 % 


0.00897 


50.391 


1.39 


95 % 


0.00625 


53.386 


4.33 


20 5 % 


0.00893 


50.416 


1.41 


95 % 


0.00624 


53.397 


4.34 


50 5 % 


0 . 00891 


50.426 


1.42 


95 % 


0.00622 


53.432 


4.38 
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Figure Al. Data Set 1, A = 10 (CPU minutes), Bayesian Method 
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Figure A2 . Data Set 1, A = 20 (CPU minutes), Bayesian Method 
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Figure A3. Data Set 1, A = 50 (CPU minutes), Bayesian Method 
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Figure A4 . Data Set 2, A = 10 (CPU seconds), Bayesian Method 
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Ficrure A5 . Data Set 2, A = 20 (CPU seconds), Bayesian Method 
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Figure A6 . Data Set 2, A = 50 (CPU seconds), Bayesian Method 
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